pacman::p_load(sf, maptools, raster, spatstat, tmap, kableExtra, tidyverse, funModeling, sfdep)Take-Home Excercise 01
1.0 Overview
Water is an essential part of our life, without which we won’t be able to survive for more than 3 days. Living in Singapore, we have access to clean drinkable water 24/7 and as a result we don’t realize the struggles of people who don’t have access to clean water at all. An an example of such are the people in Nigeria. Despite 70% of Nigerians having access to basic water services, more than half of them are contamintated (Reference).
For this assignment, we will be focusing on the State of Osun in Nigeria. Osun, located in the southwestern Nigeria is bounded to the east by Ekiti and Ondo states, Kwara on the north, Ogun to the south and to the west by Oyo State (Reference). Their economy is mainly based on the agriculture and it inhibits the Osun River, a sacred river. However, in the recent years, the river has been polluted by the several mining activities from the surrounding communities (Reference). Hence, it is integral for us to address the issue of providing clean and sustainable water to the people of Osun. Through this assignment, I aim to apply the relevant spatial point pattern analysis learned in class to analyse the Functional and Non-Functional water points in State of Osun, Nigera.

2.0 Setup
2.1 Packages Used
sf : Used for importing geospatial data, assigning or transforming coordinate systems, and converting geospatial and aspatial data into a sf data frame
tidyverse : Used for transforming and better presentation of Data
tmap : Used for plotting static point patterns maps or interactive maps
spatstat : Used for point-pattern analysis
raster : Used to read, write, manipulate, analyse and model gridded spatial data
maptools : Used to provide a set of tools for manipulating geographic data
kableExtra : Used for table customization
funModeling : Used to data cleaning, importance variable, analysis and model performance
sfdep :Used for functions creates not present in spdep.
2.2 Datasets Used
The below diagram shows the datasets used for the Assignment. We have two types of data - geospatial and aspatial.
For the Aspatial data, we are extracting the data from WPdx Global Data Repositories. The data source consists of two types of data - WPdx-Basic and WPdx+, for the purpose of this project, we will be using the WPdx+.
For the Geospatial data, we will be using the Nigeria Level-2 Administrative Boundary polygon features GIS data. There are two data source for this - Humanitarian Data Exchange (HDE) and geoBoundaries.
# initialise a dataframe of our geospatial and aspatial dataset details
datasets <- data.frame(
Type=c("Geospatial",
"Geospatial",
"Geospatial",
"Geospatial",
"Geospatial",
"Geospatial",
"Geospatial",
"Geospatial",
"Geospatial",
"Geospatial",
"Geospatial",
"Geospatial",
"Geospatial",
"Geospatial",
"Aspatial"),
Name=c("geoBoundaries-NGA-ADM2",
"geoBoundaries-NGA-ADM2",
"geoBoundaries-NGA-ADM2",
"geoBoundaries-NGA-ADM2",
"geoBoundaries-NGA-ADM2",
"geoBoundaries-NGA-ADM2",
"nga_admbnda_adm2_osgof_20190417",
"nga_admbnda_adm2_osgof_20190417",
"nga_admbnda_adm2_osgof_20190417",
"nga_admbnda_adm2_osgof_20190417",
"nga_admbnda_adm2_osgof_20190417",
"nga_admbnda_adm2_osgof_20190417",
"nga_admbnda_adm2_osgof_20190417",
"nga_admbnda_adm2_osgof_20190417",
"WPdx"),
Format=c(".dbf",
".geojson",
".prj",
".shp",
".shx",
".topojson",
".CPG",
".dbf",
".prj",
".sbn",
".sbx",
".shp",
".shp",
".shx",
".csv"),
Source=c("[geoBoundaries](https://www.geoboundaries.org/index.html#getdata)",
"[geoBoundaries](https://www.geoboundaries.org/index.html#getdata)",
"[geoBoundaries](https://www.geoboundaries.org/index.html#getdata)",
"[geoBoundaries](https://www.geoboundaries.org/index.html#getdata)",
"[geoBoundaries](https://www.geoboundaries.org/index.html#getdata)",
"[geoBoundaries](https://www.geoboundaries.org/index.html#getdata)",
"[Humanitarian Data Exchange](https://data.humdata.org/dataset/cod-ab-nga)",
"[Humanitarian Data Exchange](https://data.humdata.org/dataset/cod-ab-nga)",
"[Humanitarian Data Exchange](https://data.humdata.org/dataset/cod-ab-nga)",
"[Humanitarian Data Exchange](https://data.humdata.org/dataset/cod-ab-nga)",
"[Humanitarian Data Exchange](https://data.humdata.org/dataset/cod-ab-nga)",
"[Humanitarian Data Exchange](https://data.humdata.org/dataset/cod-ab-nga)",
"[Humanitarian Data Exchange](https://data.humdata.org/dataset/cod-ab-nga)",
"[Humanitarian Data Exchange](https://data.humdata.org/dataset/cod-ab-nga)",
"[ WPdx Global Data Repositories](https://www.waterpointdata.org/access-data/)")
)
# with reference to this guide on kableExtra:
# https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html
# kable_material is the name of the kable theme
# 'hover' for to highlight row when hovering, 'scale_down' to adjust table to fit page width
library(knitr)
library(kableExtra)
kable(datasets, caption="Datasets Used") %>%
kable_material("hover", latex_options="scale_down")| Type | Name | Format | Source |
|---|---|---|---|
| Geospatial | geoBoundaries-NGA-ADM2 | .dbf | [geoBoundaries](https://www.geoboundaries.org/index.html#getdata) |
| Geospatial | geoBoundaries-NGA-ADM2 | .geojson | [geoBoundaries](https://www.geoboundaries.org/index.html#getdata) |
| Geospatial | geoBoundaries-NGA-ADM2 | .prj | [geoBoundaries](https://www.geoboundaries.org/index.html#getdata) |
| Geospatial | geoBoundaries-NGA-ADM2 | .shp | [geoBoundaries](https://www.geoboundaries.org/index.html#getdata) |
| Geospatial | geoBoundaries-NGA-ADM2 | .shx | [geoBoundaries](https://www.geoboundaries.org/index.html#getdata) |
| Geospatial | geoBoundaries-NGA-ADM2 | .topojson | [geoBoundaries](https://www.geoboundaries.org/index.html#getdata) |
| Geospatial | nga_admbnda_adm2_osgof_20190417 | .CPG | [Humanitarian Data Exchange](https://data.humdata.org/dataset/cod-ab-nga) |
| Geospatial | nga_admbnda_adm2_osgof_20190417 | .dbf | [Humanitarian Data Exchange](https://data.humdata.org/dataset/cod-ab-nga) |
| Geospatial | nga_admbnda_adm2_osgof_20190417 | .prj | [Humanitarian Data Exchange](https://data.humdata.org/dataset/cod-ab-nga) |
| Geospatial | nga_admbnda_adm2_osgof_20190417 | .sbn | [Humanitarian Data Exchange](https://data.humdata.org/dataset/cod-ab-nga) |
| Geospatial | nga_admbnda_adm2_osgof_20190417 | .sbx | [Humanitarian Data Exchange](https://data.humdata.org/dataset/cod-ab-nga) |
| Geospatial | nga_admbnda_adm2_osgof_20190417 | .shp | [Humanitarian Data Exchange](https://data.humdata.org/dataset/cod-ab-nga) |
| Geospatial | nga_admbnda_adm2_osgof_20190417 | .shp | [Humanitarian Data Exchange](https://data.humdata.org/dataset/cod-ab-nga) |
| Geospatial | nga_admbnda_adm2_osgof_20190417 | .shx | [Humanitarian Data Exchange](https://data.humdata.org/dataset/cod-ab-nga) |
| Aspatial | WPdx | .csv | [ WPdx Global Data Repositories](https://www.waterpointdata.org/access-data/) |
3.0 Data Wrangling : Geospatial Data
3.1 Importing and Transforming Geospatial Data
We will begin by importing Geospatial data into R by using the st_read() of sf package. It imports the nga_admbnda_adm2_osgof_20190417 shapefile into R as a polygon data frame. We provide 2 arguments - dsn (which is the data path) and layer (the shapefile name)
We use the st_transform() to perform projection transaction.
geoNGA <- st_read("data/geospatial",
layer = "geoBoundaries-NGA-ADM2")Reading layer `geoBoundaries-NGA-ADM2' from data source
`C:\mayurims\IS415-GAA\Take-Home_Ex\Take-Home_Ex01\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 774 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2.668534 ymin: 4.273007 xmax: 14.67882 ymax: 13.89442
Geodetic CRS: WGS 84
NGA <- st_read(dsn = "data/geospatial",
layer = "nga_admbnda_adm2_osgof_20190417")Reading layer `nga_admbnda_adm2_osgof_20190417' from data source
`C:\mayurims\IS415-GAA\Take-Home_Ex\Take-Home_Ex01\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 774 features and 16 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2.668534 ymin: 4.273007 xmax: 14.67882 ymax: 13.89442
Geodetic CRS: WGS 84
We can use the glimpse() of dplyr to know more about the associated attribute information of the dataframe.
glimpse(geoNGA)Rows: 774
Columns: 7
$ shapeName <chr> "Aba North", "Aba South", "Arochukwu", "Bende", "Ikwuano", …
$ pcode <chr> "NG001001", "NG001002", "NG001003", "NG001004", "NG001005",…
$ level <chr> "ADM2", "ADM2", "ADM2", "ADM2", "ADM2", "ADM2", "ADM2", "AD…
$ shapeID <chr> "NGA-ADM2-13203401B25860527", "NGA-ADM2-13203401B76240303",…
$ shapeGroup <chr> "NGA", "NGA", "NGA", "NGA", "NGA", "NGA", "NGA", "NGA", "NG…
$ shapeType <chr> "ADM2", "ADM2", "ADM2", "ADM2", "ADM2", "ADM2", "ADM2", "AD…
$ geometry <MULTIPOLYGON [°]> MULTIPOLYGON (((7.387495 5...., MULTIPOLYGON (…
glimpse(NGA)Rows: 774
Columns: 17
$ Shape_Leng <dbl> 0.2370744, 0.2624772, 3.0753158, 2.5379842, 0.6871498, 1.06…
$ Shape_Area <dbl> 0.0015239210, 0.0035311037, 0.3268678399, 0.0683785064, 0.0…
$ ADM2_EN <chr> "Aba North", "Aba South", "Abadam", "Abaji", "Abak", "Abaka…
$ ADM2_PCODE <chr> "NG001001", "NG001002", "NG008001", "NG015001", "NG003001",…
$ ADM2_REF <chr> "Aba North", "Aba South", "Abadam", "Abaji", "Abak", "Abaka…
$ ADM2ALT1EN <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ ADM2ALT2EN <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ ADM1_EN <chr> "Abia", "Abia", "Borno", "Federal Capital Territory", "Akwa…
$ ADM1_PCODE <chr> "NG001", "NG001", "NG008", "NG015", "NG003", "NG011", "NG02…
$ ADM0_EN <chr> "Nigeria", "Nigeria", "Nigeria", "Nigeria", "Nigeria", "Nig…
$ ADM0_PCODE <chr> "NG", "NG", "NG", "NG", "NG", "NG", "NG", "NG", "NG", "NG",…
$ date <date> 2016-11-29, 2016-11-29, 2016-11-29, 2016-11-29, 2016-11-29…
$ validOn <date> 2019-04-17, 2019-04-17, 2019-04-17, 2019-04-17, 2019-04-17…
$ validTo <date> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ SD_EN <chr> "Abia South", "Abia South", "Borno North", "Federal Capital…
$ SD_PCODE <chr> "NG00103", "NG00103", "NG00802", "NG01501", "NG00302", "NG0…
$ geometry <MULTIPOLYGON [°]> MULTIPOLYGON (((7.401109 5...., MULTIPOLYGON (…
From the attributed visible, we can see the HDE source (NGA) has a column called ‘ADM1_EN’ which can be used to filter for water points in Osun, Nigeria. However, this is not present in the geoBoundaries dataset. As a result, we will be using the Humanitarian Data Exchange source, where we get the files - nga_admbnda_adm2.
NGA sf data.frame consists of many redundent fields. The code chunk below uses select() of dplyr to retain column 3, 4, 8 and 9.
NGA <- NGA %>%
select(c(3:4, 8:9))We then use the filter() to filter out the polygon features of Osun.
NGA <- NGA %>% filter(ADM1_EN == "Osun")Now, we use the st_crs() to check the coordinate system of the data. As we can see, it uses the WGS 84 coordinate system. The data is using a Geographic projected system, however, this is system is not appropriate since we need to use distance and area measures.
st_crs(NGA)Coordinate Reference System:
User input: WGS 84
wkt:
GEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["latitude",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["longitude",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]]
Hence, we use st_transform() and not st_set_crs() as st_set_crs() assigns the EPSG code to the dataframe, however, now we need to transform the dataframe from geographic to projected coordinate system. We will be using crs=26392 (found from the EPSGfor Nigeria).
NGA <- st_transform(NGA, crs = 26392)Verify that the CRS of NGA dataframe has changed.
st_crs(NGA)Coordinate Reference System:
User input: EPSG:26392
wkt:
PROJCRS["Minna / Nigeria Mid Belt",
BASEGEOGCRS["Minna",
DATUM["Minna",
ELLIPSOID["Clarke 1880 (RGS)",6378249.145,293.465,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4263]],
CONVERSION["Nigeria Mid Belt",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",4,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",8.5,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",0.99975,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",670553.98,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",0,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Engineering survey, topographic mapping."],
AREA["Nigeria between 6°30'E and 10°30'E, onshore and offshore shelf."],
BBOX[3.57,6.5,13.53,10.51]],
ID["EPSG",26392]]
3.2 Data Pre-processing
3.2.1 Dropping Invalid Dimensions
Since, we only have one dataframe, there are no invalid dimensions, and hence, this step is not required.
3.2.2 Invalid Geometries
The st_is_valid() function checks whether a geometry is valid and returns the indices. Whereas, the length gives you a count of the indices with invalid geometries.
length(which(st_is_valid(NGA) == FALSE))[1] 0
None of the values are Invalid, so we are good to go!!
3.2.3 Checking for Duplicated Names
We need to check for duplicate name in the data main data fields. Using duplicated() of Base R, we can flag out LGA names that might be duplicated as shown in the code chunk below.
NGA$ADM2_EN[duplicated(NGA$ADM2_EN)==TRUE]character(0)
There are no duplicated values, so we are good to go!
3.2.4 Initial Visualization
plot(st_geometry(NGA))
4.0 Data Wrangling : Aspatial Data
4.1 Importing Aspatial Data
Since the WPdx data is in CSV format, we will use read_csv() of readr package to import WPdx.csv. The output is called wp_nga and is a tibble dataframe
wp_nga <- read_csv("data/aspatial/WPdx.csv") %>%
filter(`#clean_country_name` == "Nigeria" & `#clean_adm1` == "Osun")4.2 Converting water point data into sf point features
Converting an aspatial data into an sf data.frame involves two steps.
First, we need to convert the wkt field into sfc field by using st_as_sfc() function. The function stores it in a tibble data format.
wp_nga$Geometry = st_as_sfc(wp_nga$`New Georeferenced Column`)
wp_nga# A tibble: 5,557 × 71
row_id `#source` #lat_…¹ #lon_…² #repo…³ #stat…⁴ #wate…⁵ #wate…⁶ #wate…⁷
<dbl> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr>
1 429123 GRID3 8.02 5.06 08/29/… Unknown <NA> <NA> Tapsta…
2 70566 Federal Minis… 7.32 4.79 05/11/… No Protec… Well Mechan…
3 70578 Federal Minis… 7.76 4.56 05/11/… No Boreho… Well Mechan…
4 66401 Federal Minis… 8.03 4.64 04/30/… No Boreho… Well Mechan…
5 422190 GRID3 7.87 4.88 08/29/… Unknown <NA> <NA> Tapsta…
6 422064 GRID3 7.7 4.89 08/29/… Unknown <NA> <NA> Tapsta…
7 65607 Federal Minis… 7.89 4.71 05/12/… No Boreho… Well Mechan…
8 68989 Federal Minis… 7.51 4.27 05/07/… No Boreho… Well <NA>
9 67708 Federal Minis… 7.48 4.35 04/29/… Yes Boreho… Well Mechan…
10 66419 Federal Minis… 7.63 4.50 05/08/… Yes Boreho… Well Hand P…
# … with 5,547 more rows, 62 more variables: `#water_tech_category` <chr>,
# `#facility_type` <chr>, `#clean_country_name` <chr>, `#clean_adm1` <chr>,
# `#clean_adm2` <chr>, `#clean_adm3` <chr>, `#clean_adm4` <chr>,
# `#install_year` <dbl>, `#installer` <chr>, `#rehab_year` <lgl>,
# `#rehabilitator` <lgl>, `#management_clean` <chr>, `#status_clean` <chr>,
# `#pay` <chr>, `#fecal_coliform_presence` <chr>,
# `#fecal_coliform_value` <dbl>, `#subjective_quality` <chr>, …
Next, we use the st_sf() to convert the tibble data.frame into an sf object. It is also important for us to include the referencing system of the data into the sf object.
wp_sf <- st_sf(wp_nga, crs=4326)
wp_sfSimple feature collection with 5557 features and 70 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 4.032004 ymin: 7.060309 xmax: 5.06 ymax: 8.061898
Geodetic CRS: WGS 84
# A tibble: 5,557 × 71
row_id `#source` #lat_…¹ #lon_…² #repo…³ #stat…⁴ #wate…⁵ #wate…⁶ #wate…⁷
* <dbl> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr>
1 429123 GRID3 8.02 5.06 08/29/… Unknown <NA> <NA> Tapsta…
2 70566 Federal Minis… 7.32 4.79 05/11/… No Protec… Well Mechan…
3 70578 Federal Minis… 7.76 4.56 05/11/… No Boreho… Well Mechan…
4 66401 Federal Minis… 8.03 4.64 04/30/… No Boreho… Well Mechan…
5 422190 GRID3 7.87 4.88 08/29/… Unknown <NA> <NA> Tapsta…
6 422064 GRID3 7.7 4.89 08/29/… Unknown <NA> <NA> Tapsta…
7 65607 Federal Minis… 7.89 4.71 05/12/… No Boreho… Well Mechan…
8 68989 Federal Minis… 7.51 4.27 05/07/… No Boreho… Well <NA>
9 67708 Federal Minis… 7.48 4.35 04/29/… Yes Boreho… Well Mechan…
10 66419 Federal Minis… 7.63 4.50 05/08/… Yes Boreho… Well Hand P…
# … with 5,547 more rows, 62 more variables: `#water_tech_category` <chr>,
# `#facility_type` <chr>, `#clean_country_name` <chr>, `#clean_adm1` <chr>,
# `#clean_adm2` <chr>, `#clean_adm3` <chr>, `#clean_adm4` <chr>,
# `#install_year` <dbl>, `#installer` <chr>, `#rehab_year` <lgl>,
# `#rehabilitator` <lgl>, `#management_clean` <chr>, `#status_clean` <chr>,
# `#pay` <chr>, `#fecal_coliform_presence` <chr>,
# `#fecal_coliform_value` <dbl>, `#subjective_quality` <chr>, …
Like step 3.2 Data Pre-processing, we transform the projection from wgs84 to the appropriate projected coordinate system of Nigeria.
wp_sf <- wp_sf %>%
st_transform(crs = 26392)4.3 Data Wrangling for Water Data Point
Exploratory Data Analysis (EDA) helps to gain initial understanding of the data. The freq() of funModeling package is used to reveal the distribution of water point status visually.
freq(data = wp_sf,
input = '#status_clean')
#status_clean frequency percentage cumulative_perc
1 Functional 2319 41.73 41.73
2 Non-Functional 2008 36.13 77.86
3 <NA> 748 13.46 91.32
4 Functional but needs repair 248 4.46 95.78
5 Non-Functional due to dry season 151 2.72 98.50
6 Functional but not in use 63 1.13 99.63
7 Abandoned 15 0.27 99.90
8 Abandoned/Decommissioned 5 0.09 100.00
The diagram shows that there are nine classes present in the ‘status_clean’ field. Hence, now we will be performing data wrangling tasks to create 3 data object - Functional, Non-Functional and Unknown.
We use rename() function from the dplyr package to rename the column from #status_clean to status_clean for easier handling in subsequent steps. select() is used to include status_clean in the output sf data.frame. We use the mutate() and replace_na() functions to recode all the NA values in status_clean into unknown.
wp_sf_nga <- wp_sf %>%
rename(status_clean = '#status_clean') %>%
select(status_clean) %>%
mutate(status_clean = replace_na(
status_clean, "unknown"))4.3.1 Extracting Water Point Data
Now we are ready to extract the water point data according to their status.
The code chunk below is used to extract functional water point.
wp_functional <- wp_sf_nga %>%
filter(status_clean %in%
c("Functional",
"Functional but not in use",
"Functional but needs repair"))The code chunk below is used to extract nonfunctional water point.
wp_nonfunctional <- wp_sf_nga %>%
filter(status_clean %in%
c("Abandoned/Decommissioned",
"Abandoned",
"Non-Functional due to dry season",
"Non-Functional",
"Non functional due to dry season"))The code chunk below is used to extract water point with unknown status.
wp_unknown <- wp_sf_nga %>%
filter(status_clean == "unknown")Performing a quick EDA on the derived sfa.dataframes
freq(data = wp_functional,
input = 'status_clean')
status_clean frequency percentage cumulative_perc
1 Functional 2319 88.17 88.17
2 Functional but needs repair 248 9.43 97.60
3 Functional but not in use 63 2.40 100.00
freq(data = wp_nonfunctional,
input = 'status_clean')
status_clean frequency percentage cumulative_perc
1 Non-Functional 2008 92.15 92.15
2 Non-Functional due to dry season 151 6.93 99.08
3 Abandoned 15 0.69 99.77
4 Abandoned/Decommissioned 5 0.23 100.00
freq(data = wp_unknown,
input = 'status_clean')
status_clean frequency percentage cumulative_perc
1 unknown 748 100 100
We can see from the map below, the proportion of functional and non-functional water is quite similar.
tmap_mode("view")
tm_shape(wp_functional) +
tm_dots(col = "status_clean",
pal = "blue",
title = "Functional") +
tm_shape(wp_nonfunctional) +
tm_dots(col = "status_clean",
pal = "red",
title = "Non-Functional") +
tm_view(set.zoom.limits = c(8.5,15)) 4.3.2 Performing Point-In Polygon Count
Next, we want to find out the number of total, functional, nonfunctional and unknown water points in Osun State. This is performed in the following code chunk. First, it identifies the functional water points in each LGA by using st_intersects() of sf package. Next, length() is used to calculate the number of functional water points that fall inside each LGA.
NGA_wp <- NGA %>%
mutate(`total_wp` = lengths(
st_intersects(NGA, wp_sf_nga))) %>%
mutate(`wp_functional` = lengths(
st_intersects(NGA, wp_functional))) %>%
mutate(`wp_nonfunctional` = lengths(
st_intersects(NGA, wp_nonfunctional))) %>%
mutate(`wp_unknown` = lengths(
st_intersects(NGA, wp_unknown)))
NGA_wpSimple feature collection with 30 features and 8 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 176503.2 ymin: 331434.7 xmax: 291043.8 ymax: 454520.1
Projected CRS: Minna / Nigeria Mid Belt
First 10 features:
ADM2_EN ADM2_PCODE ADM1_EN ADM1_PCODE geometry
1 Aiyedade NG030001 Osun NG030 MULTIPOLYGON (((213526.6 34...
2 Aiyedire NG030002 Osun NG030 MULTIPOLYGON (((212542.6 40...
3 Atakumosa East NG030003 Osun NG030 MULTIPOLYGON (((265746.8 37...
4 Atakumosa West NG030004 Osun NG030 MULTIPOLYGON (((248871.4 40...
5 Boluwaduro NG030005 Osun NG030 MULTIPOLYGON (((266092.2 43...
6 Boripe NG030006 Osun NG030 MULTIPOLYGON (((255072.5 43...
7 Ede North NG030007 Osun NG030 MULTIPOLYGON (((236386.9 41...
8 Ede South NG030008 Osun NG030 MULTIPOLYGON (((236386.9 41...
9 Egbedore NG030009 Osun NG030 MULTIPOLYGON (((220756 4317...
10 Ejigbo NG030010 Osun NG030 MULTIPOLYGON (((214422.1 42...
total_wp wp_functional wp_nonfunctional wp_unknown
1 389 157 154 78
2 175 89 57 29
3 223 98 92 33
4 246 111 103 32
5 129 63 51 15
6 177 79 85 13
7 216 141 50 25
8 146 72 39 35
9 142 63 44 35
10 434 274 126 34
We then visualise attributes by using statistcal graph. We use functions of ggplot2 package to reveal the distribution of total water points in Osun’s LGA using histogram.
ggplot(data = NGA_wp,
aes(x = total_wp)) +
geom_histogram(bins=20,
color="black",
fill="light blue") +
geom_vline(aes(xintercept=mean(
total_wp, na.rm=T)),
color="red",
linetype="dashed",
size=0.8) +
ggtitle("Distribution of total water points") +
xlab("No. of water points") +
ylab("No. of\nLGAs") +
theme(axis.title.y=element_text(angle = 0))
4.4 Choropleth Mapping
We will be calculating the proportion of Functional and Non-Functional water points and mapping them to see which area of Osun has more proportions of functional and non-functional water points.
NGA_wp_total <- NGA_wp %>%
mutate(pct_functional = wp_functional/total_wp) %>%
mutate(pct_nonfunctional = wp_nonfunctional/total_wp)tm_shape(NGA_wp_total) +
tm_fill("pct_functional",
n = 10,
style = "equal",
palette = "Blues",
legend.hist = TRUE) +
tm_borders(lwd = 0.1,
alpha = 1) +
tm_layout(main.title = "Rate map of functional water point by LGAs",
legend.outside = TRUE)tm_shape(NGA_wp_total) +
tm_fill("pct_nonfunctional",
n = 10,
style = "equal",
palette = "Blues",
legend.hist = TRUE) +
tm_borders(lwd = 0.1,
alpha = 1) +
tm_layout(main.title = "Rate map of non-functional water point by LGAs",
legend.outside = TRUE)5.0 Combined Data Wrangling : Geospatial & Aspatial Data
This is an essential step, which is the process of getting data from its raw input into a format which can be used for anlaysis.
5.1 Converting sf data frames to sp’s Spatial* Class
We use the as_spatial() function to convert the three geospatial data from simple feature data frame to sp’s Spatial* class.
wp_functional_spatial = as_Spatial(wp_functional)
wp_nonfunctional_spatial = as_Spatial(wp_nonfunctional)
NGA_spatial <- as_Spatial(NGA)NGA_spatialclass : SpatialPolygonsDataFrame
features : 30
extent : 176503.2, 291043.8, 331434.7, 454520.1 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs
variables : 4
names : ADM2_EN, ADM2_PCODE, ADM1_EN, ADM1_PCODE
min values : Aiyedade, NG030001, Osun, NG030
max values : Osogbo, NG030030, Osun, NG030
wp_functional_spatialclass : SpatialPointsDataFrame
features : 2630
extent : 177285.9, 290751, 343128.1, 450859.7 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs
variables : 1
names : status_clean
min values : Functional
max values : Functional but not in use
wp_nonfunctional_spatialclass : SpatialPointsDataFrame
features : 2179
extent : 180539, 290616, 340054.1, 450780.1 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs
variables : 1
names : status_clean
min values : Abandoned
max values : Non-Functional due to dry season
5.2 Converting from Spatial* classes to sp format
In order to use the spatstat for our analysis, we need our data to be in the ppp object form. Hence, we first need to convert them into Spatial object first and then into ppp object.
# convert into respective sp (in our case, either polygons or points)
wp_functional_sp <- as(wp_functional_spatial, "SpatialPoints")
wp_nonfunctional_sp <- as(wp_nonfunctional_spatial, "SpatialPoints")
NGA_sp <-as(NGA_spatial, "SpatialPolygons")wp_functional_spclass : SpatialPoints
features : 2630
extent : 177285.9, 290751, 343128.1, 450859.7 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs
wp_nonfunctional_spclass : SpatialPoints
features : 2179
extent : 180539, 290616, 340054.1, 450780.1 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs
NGA_spclass : SpatialPolygons
features : 30
extent : 176503.2, 291043.8, 331434.7, 454520.1 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs
5.3 Converting from sp format to spatstat ppp format
We can’t convert SpatialPolygons to ppp format - nor is there any need to. Hence, we won’t be including our ‘base map’, NGA.
# from sp object, convert into ppp format
wp_functional_ppp <- as(wp_functional_sp, "ppp")
wp_nonfunctional_ppp <- as(wp_nonfunctional_sp, "ppp")The below map shows the point paterns for both functional and non-functional water points.
par(mfrow=c(1,2))
plot(wp_nonfunctional_ppp)
plot(wp_functional_ppp)
5.3.1 Handling Duplicated Points + Jittering
any(duplicated(wp_functional_ppp)) [1] FALSE
any(duplicated(wp_nonfunctional_ppp)) [1] FALSE
Since there is no duplication, we dont have to apply the process of Jittering.
5.4 Creating Owin Object
We need to now confine the analysis with a geographical area - Osun State and we do this by creating a object called owin which represent the polygonal region. The below code covert the SpatialPolygon (NGA_sp) created into an owin object.
NGA_owin <- as(NGA_sp, "owin")
plot(NGA_owin)
5.5 Combining point events object and owin object
In this step, we extract the functional and non-functional water points that are located within Osun, Nigeria. This combines both the point and polygon feature into one ppp object class.
wp_functional_ppp = wp_functional_ppp[NGA_owin]
wp_nonfunctional_ppp = wp_nonfunctional_ppp[NGA_owin]par(mfrow=c(1,2))
plot(wp_nonfunctional_ppp)
plot(wp_functional_ppp)
6.0 Exploratory Spatial Data Analysis
In this section, we use the Hands-on Excercise 04 to help
Derive kernel density maps of functional and non-functional water points. Using appropriate tmap functions,
Display the kernel density maps on openstreetmap of Osub State, Nigeria.
Describe the spatial patterns revealed by the kernel density maps. Highlight the advantage of kernel density map over point map.
6.1 Kernel Density Estimation
6.1.1 Computing Kernel Density Estimation
There are two types of bandwidth methods - Fixed (Automatic) and Adaptive bandwidth method. These methods employ different uniform bases in density calculation.
Computing using Automatic Bandwidth selection method
We can compute the kernel density by using the bw.ppl() or bw.diggle(). As learned in Chapter 04, the ppl() method is prefered for patterns consisting predominantly of prominent clusters. Whereas, bw.diggle() is best used to detect a single tight cluster in the midst of random noise.
From the maps below, we can see that bw.ppl() method is better able to identify the prominent clusters as the data does not contain a single tight cluster. Hence, we will be using the bw.ppl() method.
bw.diggle() Method
kde_wpfunctional_bw_diggle <- density(wp_functional_ppp,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian")
kde_wpnonfunctional_bw_diggle <- density(wp_nonfunctional_ppp,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian")
par(mfrow=c(1,2))
plot(kde_wpfunctional_bw_diggle,
main = "Functional Water Points",
ribside=c("right"))
plot(kde_wpnonfunctional_bw_diggle,
main = "Non-Functional Water Points",
ribside=c("right"))
bw.ppl() Method
kde_wpfunctional_bw <- density(wp_functional_ppp,
sigma=bw.ppl,
edge=TRUE,
kernel="gaussian")
kde_wpnonfunctional_bw <- density(wp_nonfunctional_ppp,
sigma=bw.ppl,
edge=TRUE,
kernel="gaussian")
par(mfrow=c(1,2))
plot(kde_wpfunctional_bw,
main = "Functional Water Points",
ribside=c("right"))
plot(kde_wpnonfunctional_bw,
main = "Non-Functional Water Points",
ribside=c("right"))
Computing using Adaptive Bandwidth selection method
kde_wpfunctional_adaptive <- adaptive.density(wp_functional_ppp, method="kernel")
kde_wpnonfunctional_adaptive <- adaptive.density(wp_nonfunctional_ppp, method="kernel")
par(mfrow=c(1,2))
plot(kde_wpfunctional_adaptive,
main = "Functional Water Points",
ribside=c("right"))
plot(kde_wpnonfunctional_adaptive,
main = "Non-Functional Water Points",
ribside=c("right"))
Comparing Automated and Adapting Bandwidth Methods (side-by-side)
par(mfrow=c(1,2))
plot(kde_wpfunctional_bw,
main = "Functional Water Points - Automated",
ribside=c("right"))
plot(kde_wpfunctional_adaptive,
main = "Functional Water Points - Adaptive",
ribside=c("right"))
par(mfrow=c(1,2))
plot(kde_wpnonfunctional_bw,
main = "Non-Functional Water Points - Automated",
ribside=c("right"))
plot(kde_wpnonfunctional_adaptive,
main = "Non-Functional Water Points - Adaptive",
ribside=c("right"))
6.1.2 Rescalling KDE Values
As we can the KDE values are small (ranging from 0 to 0.000035). This is because the default unit of measurement of svy21 is in meter. As a result, the density values computed is in “number of points per square meter”. So rescale() is used to covert the unit of measurement from meter to kilometer.
wp_functional_ppp_km <- rescale(wp_functional_ppp, 1000, "km")
wp_nonfunctional_ppp_km <- rescale(wp_nonfunctional_ppp, 1000, "km")Now we re-plot the graphs
kde_wpfunctional_km <- density(wp_functional_ppp_km,
sigma=bw.ppl,
edge=TRUE,
kernel="gaussian")
kde_wpnonfunctional_km <- density(wp_nonfunctional_ppp_km,
sigma=bw.ppl,
edge=TRUE,
kernel="gaussian")
par(mfrow=c(1,2))
plot(kde_wpfunctional_bw,
main = "Functional Water Points",
ribside=c("right"))
plot(kde_wpnonfunctional_bw,
main = "Non-Functional Water Points",
ribside=c("right"))
kde_wpfunctional_adaptive_km <- adaptive.density(wp_functional_ppp_km, method="kernel")
kde_wpnonfunctional_adaptive_km <- adaptive.density(wp_nonfunctional_ppp_km, method="kernel")
par(mfrow=c(1,2))
plot(kde_wpfunctional_adaptive,
main = "Functional Water Points",
ribside=c("right"))
plot(kde_wpnonfunctional_adaptive,
main = "Non-Functional Water Points",
ribside=c("right"))
For this assignment, we will be using the Automated Bandwidth method because it defines its base in geographical space, where as the Adaptive method defines it in population (Reference). As we learned in Chapter 04, Automated Bandwidth is very sensitive to highly skew distribution of spatial point patterns over geographical units (e.g - urban versus rural). However, since we don’t have highly skewed data (as seen in the distribution graph in 4.3.2 Performing Point-In Polygon Count), we can use Fixed/Automated Bandwidth method.
6.2 Converting KDE output into grid object
gridded_wpfunctional <- as.SpatialGridDataFrame.im(kde_wpfunctional_km)
gridded_wpnonfunctional <- as.SpatialGridDataFrame.im(kde_wpnonfunctional_km)
spplot(gridded_wpfunctional)
spplot(gridded_wpnonfunctional)
6.2.1 Converting Gridded Output into Raster
Next, we convert the gridded kernal density objects into RasterLayer object by using raster() of raster package.
kde_wpfunctional_raster <- raster(gridded_wpfunctional)
kde_wpfunctional_rasterclass : RasterLayer
dimensions : 128, 128, 16384 (nrow, ncol, ncell)
resolution : 0.8948485, 0.9616045 (x, y)
extent : 176.5032, 291.0438, 331.4347, 454.5201 (xmin, xmax, ymin, ymax)
crs : NA
source : memory
names : v
values : -4.99773e-16, 10.55944 (min, max)
kde_wpnonfunctional_raster <- raster(gridded_wpnonfunctional)
kde_wpnonfunctional_rasterclass : RasterLayer
dimensions : 128, 128, 16384 (nrow, ncol, ncell)
resolution : 0.8948485, 0.9616045 (x, y)
extent : 176.5032, 291.0438, 331.4347, 454.5201 (xmin, xmax, ymin, ymax)
crs : NA
source : memory
names : v
values : -2.52505e-16, 9.25861 (min, max)
6.2.2 Assigning Projection Systems
projection(kde_wpfunctional_raster) <- CRS("+init=EPSG:26392 +datum:WGS84 +units=km")
kde_wpfunctional_rasterclass : RasterLayer
dimensions : 128, 128, 16384 (nrow, ncol, ncell)
resolution : 0.8948485, 0.9616045 (x, y)
extent : 176.5032, 291.0438, 331.4347, 454.5201 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +units=km +no_defs
source : memory
names : v
values : -4.99773e-16, 10.55944 (min, max)
projection(kde_wpnonfunctional_raster) <- CRS("+init=EPSG:26392 +datum:WGS84 +units=km")
kde_wpnonfunctional_rasterclass : RasterLayer
dimensions : 128, 128, 16384 (nrow, ncol, ncell)
resolution : 0.8948485, 0.9616045 (x, y)
extent : 176.5032, 291.0438, 331.4347, 454.5201 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +units=km +no_defs
source : memory
names : v
values : -2.52505e-16, 9.25861 (min, max)
6.3 Kernel Density Maps on OpenStreetMap
Now, as the assignment requirements has specified, we should plot our kernel density maps on OpenStreetMap. Since we’ll be plotting a lot of kernel density maps, let’s create a function:
density_map <- function(raster_object, map_title) {
tmap_mode("view")
tm_basemap("OpenStreetMap") +
tm_shape(raster_object) +
tm_raster("v", alpha=0.9) +
tm_layout(legend.position = c("right", "bottom"),
legend.height = 0.5,
legend.width = 0.4,
main.title = map_title,
main.title.position = 'center',
main.title.size = 1,
frame = TRUE) +
tm_view(set.zoom.limits = c(8, 13))
} kde_wpfunctional_density_map <- density_map(kde_wpfunctional_raster, map_title = "Functional Water Points in Osun State")
kde_wpnonfunctional_density_map <- density_map(kde_wpnonfunctional_raster, map_title = "Non-Functional Water Points in Osun State")kde_wpfunctional_density_mapkde_wpnonfunctional_density_maptmap_mode('plot')
tm_basemap("OpenStreetMap") +
tm_shape(kde_wpfunctional_raster) +
tm_raster("v")
tmap_mode('plot')
tm_basemap("OpenStreetMap") +
tm_shape(kde_wpnonfunctional_raster) +
tm_raster("v")
6.4 Kernel Density Maps Analysis
As we can see in the map in 4.3 Data Wrangling for Water Data Point, the number of functional and non-functional water points are quite close, with there being 2319 Functional and 2008 Non-functional water points. As a result, both the density maps are similar. But what is interesting to note is that, there seems to be more concentrated density points for the Non-Functional water points, compared to the Functional water points. Further, from the density maps above, we can see that both the Functional and Non-Functional water points are relatively more concentrated in the center and the upper part of Osun and we don’t see that many water points in lower part of Osun. This patter is reflected in the map in [5.5 Combining point events object and owin object], where we see less concentration of points in the lower part of Osun.
6.5 Advantage of Kernel Density Map over Point Map
To understand the advantage of Kernel Density Map over Point Map, we first need to plot the two and compare the differences.
tmap_mode("plot")
tm_shape(NGA_wp) +
tm_borders(alpha = 0.5) +
tmap_options(check.and.fix = TRUE) +
tm_shape(wp_nonfunctional) +
tm_dots(col="red", size=0.05) +
tm_layout(main.title = "Non-Functional Water Points",
main.title.position = "center",
main.title.size = 1.2,
frame = TRUE)
kde_wpnonfunctional_density_map
With the Kernel Density Map, denser areas with a heavier distribution of Non-Functional Water Points are easily spotted. This is because the kernel density z-estimate helps to smooth out the points in a given area. Compared to the point map which just shows the points. Further, the gradient colour available (ranging from yellow to green) helps in understanding the density/concentration of water pumps in the area. It clearly shows the viewer which are the areas with more non-functional water pumps, however, with the point map, the users have to gauge/estimate which are the denser with more non-functional water points.
Further, another advantage of Kernel Density Maps is that it uses the Inverse Distance Weighed method which is estimating cell values by using a linearly weighted combination of a set of sample points (Reference). This is quite useful, because it takes into consideration of water points that are further away (from the residential area) and hence, people might have to travel further to access the water points. And this is accounted by the kernel function, and not the Point Map.
Hence to conclude, the Kernal Density provides a quantitative value representing the concentration of points, where as this can only be observed/gauged in Point Map.
6.6 Colocation of Functional and Non-Functional Water Points
To find the colocation of functional and non-functional water point, the ‘unknown’ water points are not required, and hence, we use the filter() to remove them.
wp_sf_withoutUnknown <- wp_sf_nga %>% filter(!status_clean=='unknown')In the code chunk below, st_knn() of sfdep package is used to determine the k (i.e. 6) nearest neighbours for given point geometry. The function st_kernel_weights() is used to derive a weights list by using a kernel function. To compute LCLQ, the reference point data must be in either character or vector list. We create two vector lists - one of Functional and for Non-Functional water point and are called A and B respectively. The code local_colocation() os used to compute the LCLQ values for each water point. Before we can plot the LCLQ values their p-values, we need to join the output of local_colocation() to the stores sf data.frame.
# This is required for Take Home Excercise 3
nb <- include_self(
st_knn(st_geometry(wp_sf_withoutUnknown),6))
wt <- st_kernel_weights(nb,
wp_sf_withoutUnknown,
"gaussian",
adaptive = TRUE)
A <- wp_functional$status_clean
B <- wp_nonfunctional$status_clean
LCLQ <- local_colocation(A, B, nb, wt, 49)
LCLQ_stores <- cbind(wp_sf_withoutUnknown, LCLQ)The below map plots the LCLQ analysis
tmap_mode("view")
tm_shape(NGA) +
tm_polygons() +
tm_shape(LCLQ_stores)+
tm_dots(col = "Non.Functional",
size = 0.05,
border.col = "grey",
border.lwd = 0.5) +
tm_view(set.zoom.limits = c(8,11))tmap_mode("view")
tm_shape(NGA) +
tm_polygons() +
tm_shape(LCLQ_stores)+
tm_dots(col = "Abandoned",
size = 0.05,
border.col = "grey",
border.lwd = 0.5) +
tm_view(set.zoom.limits = c(9,13))tmap_mode("view")
tm_shape(NGA) +
tm_polygons() +
tm_shape(LCLQ_stores)+
tm_dots(col = "Non.Functional.due.to.dry.season",
size = 0.05,
border.col = "grey",
border.lwd = 0.5) +
tm_view(set.zoom.limits = c(9,11))The above maps show the colocation of Functional Water Points and Non-Functional Water Points. This means that Non-functional water points in color (‘Non-Functional’, ‘Abandoned’, ‘Non-functional due to dry season’) are surrounded by several Functional water points and hence, are colocated. The above code evaluates each Non-Functional water point individually for colocation with the presence of Functional water points. Hence, if the number of Non-functional water points within the neighbourhood of Functional water points are higher than the global proportion of Non-Functional water points, the colocation point will be higher. As we can see from the above maps, there are many Non-functional water points colocated with Functional water points compared to that of ‘Abandoned’ or ‘Non-functional due to dry season’.
7.0 Second-order Spatial Point Patterns Analysis
For Functional Water Point in Osun Sate
H0: The distribution of the Functional Water Points are randomly distributed
H1: The distribution of the Functional Water Points are not randomly distributed
Confidence level : 99%
Significance level : 0.01
For Non-Functional Water Point in Osun Sate
H0: The distribution of the Non-Functional Water Points are randomly distributed
H1: The distribution of the Non-Functional Water Points are not randomly distributed
Confidence level : 99%
Significance level : 0.01
The null hypothesis will be rejected if p-value is smaller than alpha value of 0.001.
7.1 Analysing Spatial Point Process Using G-Function
We will be using the G-Function for analyse the spatial point process. This function deals with the cumulative distribution of the nearest neighbor distances. It computes the nearest neighbor distance for each event, which is then sorted from smallest to largest. This is used to construct a cumilative distribution.
7.1.1 Functional Water Point
Computing G-function estimation
G_wp_functional = Gest(wp_functional_ppp, correction = "border")
plot(G_wp_functional, xlim=c(0,500))
Performing Complete Spatial Randomness Test
G_wp_functional.csr <- envelope(wp_functional_ppp, Gest, nsim = 999)Generating 999 simulations of CSR ...
1, 2, 3, ......10.........20.........30.........40.........50.........60........
.70.........80.........90.........100.........110.........120.........130......
...140.........150.........160.........170.........180.........190.........200....
.....210.........220.........230.........240.........250.........260.........270..
.......280.........290.........300.........310.........320.........330.........340
.........350.........360.........370.........380.........390.........400........
.410.........420.........430.........440.........450.........460.........470......
...480.........490.........500.........510.........520.........530.........540....
.....550.........560.........570.........580.........590.........600.........610..
.......620.........630.........640.........650.........660.........670.........680
.........690.........700.........710.........720.........730.........740........
.750.........760.........770.........780.........790.........800.........810......
...820.........830.........840.........850.........860.........870.........880....
.....890.........900.........910.........920.........930.........940.........950..
.......960.........970.........980.........990........ 999.
Done.
plot(G_wp_functional.csr)
Conclusion: The gray band shows for every distance, the smallest value and the largest value for of G(r) that is obtained out of 1000 simulations. The observed G(r) is far above the G(theo) as well as the envelope - indicating that Functional Water Points are clustered. Hence, we reject the null hypothesis that Functional Water Points are randomly distributed at 99% confident interval. Since the G(r) is above the randomization envelope, that is there are a lot of functional water points that intervent distances, as a result the curve climbs up very quickly. This suggests clustering.
7.1.2 Non-Functional Water Point
Computing G-function estimation
G_wp_nonfunctional = Gest(wp_nonfunctional_ppp, correction = "border")
plot(G_wp_nonfunctional, xlim=c(0,500))
Performing Complete Spatial Randomness Test
G_wp_nonfunctional.csr <- envelope(wp_nonfunctional_ppp, Gest, nsim = 999)Generating 999 simulations of CSR ...
1, 2, 3, ......10.........20.........30.........40.........50.........60........
.70.........80.........90.........100.........110.........120.........130......
...140.........150.........160.........170.........180.........190.........200....
.....210.........220.........230.........240.........250.........260.........270..
.......280.........290.........300.........310.........320.........330.........340
.........350.........360.........370.........380.........390.........400........
.410.........420.........430.........440.........450.........460.........470......
...480.........490.........500.........510.........520.........530.........540....
.....550.........560.........570.........580.........590.........600.........610..
.......620.........630.........640.........650.........660.........670.........680
.........690.........700.........710.........720.........730.........740........
.750.........760.........770.........780.........790.........800.........810......
...820.........830.........840.........850.........860.........870.........880....
.....890.........900.........910.........920.........930.........940.........950..
.......960.........970.........980.........990........ 999.
Done.
plot(G_wp_nonfunctional.csr)
Conclusion: The observed G(r) is far above the G(theo) as well as the envelope - indicating that Non-Functional Water Points are clustered. Hence, we reject the null hypothesis that Non Functional Water Points are randomly distributed at 99% confident interval.
7.2 Analysing Spatial Point Process Using L-Function
7.2.1 Functional Water Point
#|eval: false
#L_wp = Lest(wp_functional_ppp, correction = "Ripley")
#plot(L_wp, . -r ~ r,
#ylab= "L(d)-r", xlab = "d(m)")#|eval: false
#L_wp.csr <- envelope(wp_functional_ppp, Lest, nsim = 39, rank = 1, glocal=TRUE)#|eval: false
#plot(L_wp.csr, . - r ~ r, xlab="d", ylab="L(d)-r")Non functional L test
#L_nonwp = Lest(wp_nonfunctional_ppp, correction = "Ripley")
#plot(L_ck_wp, . -r ~ r,
#ylab= "L(d)-r", xlab = "d(m)")#|eval: false
#L_nonwp.csr <- envelope(wp_nonfunctional_ppp, Lest, nsim = 39, rank = 1, glocal=TRUE)#|eval: false
#plot(L_nonwp.csr, . - r ~ r, xlab="d", ylab="L(d)-r")8.0 Spatial Correlation Analysis
8.1 Data Pre-Processing
For this, we will be using the wp_sf_nga dataframe which has all the water points. We will first convert sf data frames to sp’s Spatial class
wp_spatial <- as_Spatial(wp_sf_nga)Convert spatial class into generic sp class
wp_sp <- as(wp_spatial, "SpatialPoints")Converting generic sp format into spatstat’s ppp format
wp_ppp <- as(wp_sp, "ppp")
wp_pppPlanar point pattern: 5557 points
window: rectangle = [177285.9, 291287.05] x [340054.1, 450859.7] units
plot(wp_ppp)
For this analysis, we will be working with marked data, and we know that the values are categorical (different water points), we need to ensure that the marked field is of factor data type. However, as seen from the output, our status_clean field is of chr data type, not factor! Hence, we will use the as.factor() function:
wp_spatial@data$status_clean <-as.factor(wp_spatial@data$status_clean)We then convert our spatial data datafram intto ppp format and create an owin object.
wp_spatial_marked_ppp <- as(wp_spatial, "ppp")wp_spatial_marked_ppp = wp_spatial_marked_ppp[NGA_owin]plot(wp_spatial_marked_ppp, which.marks = "status_clean")
We use the density() to compute the kernel density objects, and the use plot() to plot it out. Further, we convert the meter to kilometers using rescale()
plot((density(split(rescale(wp_spatial_marked_ppp, 1000)))))
Before we proceed with our second order spatial analysis, we need to assign marks to the ppp objects + check the levels of the marks:
levels(marks(wp_spatial_marked_ppp))[1] "Abandoned" "Abandoned/Decommissioned"
[3] "Functional" "Functional but needs repair"
[5] "Functional but not in use" "Non-Functional"
[7] "Non-Functional due to dry season" "unknown"
As we can see from the marks, and the above density maps, there are 7 levels here. However, we need them to be only classified into 3 - Functional, Non-Functional and unknown. Hence, we will be using the below code to rename the variable.
levels(marks(wp_spatial_marked_ppp))[levels(marks(wp_spatial_marked_ppp)) == "Abandoned"] <- "Non-Functional"levels(marks(wp_spatial_marked_ppp))[levels(marks(wp_spatial_marked_ppp)) == "Abandoned/Decommissioned"] <- "Non-Functional"levels(marks(wp_spatial_marked_ppp))[levels(marks(wp_spatial_marked_ppp)) == "Non-Functional due to dry season"] <- "Non-Functional"levels(marks(wp_spatial_marked_ppp))[levels(marks(wp_spatial_marked_ppp)) == "Functional but needs repair"] <- "Functional"levels(marks(wp_spatial_marked_ppp))[levels(marks(wp_spatial_marked_ppp)) == "Functional but not in use"] <- "Functional"Now, upon running the below code, we can see that all the levels are categorisied into our 3 desired levels.
levels(wp_spatial_marked_ppp[["marks"]])[1] "Non-Functional" "Functional" "unknown"
plot(wp_spatial_marked_ppp, which.marks = "status_clean")
plot(density(split(wp_spatial_marked_ppp)))
From the above density maps, we can observe a relationship between Functional and Non-Functional water points - in the sense that the there seems to be a lot of water Functional water points where ever there are Non-Functional water points. They seem to coexist together quite a lot of times, and hence indicating some level of dependence between them. Hence, to investigate this, we will be using the Cross K-Function.
H0: The distribution of Functional and Non-Functional water points are spatially independent
H1: The distribution of Functional and Non-Functional water points are not spatially independent
Confidence Level: 99%
Significance Level: 0.01%
The null-hypothesis will be rejected if the p-value is smaller than alpha value of 0.01.
wp_spatial_marked_ppp_Lcross.csr <- envelope(wp_spatial_marked_ppp,
Lcross,
i="Functional",
j="Non-Functional",
correction="border",
nsim=999)Generating 999 simulations of CSR ...
1, 2, 3, ......10 [etd 3:43] .........20 [etd 3:58] .........
30 [etd 3:52] .........40 [etd 3:58] .........50 [etd 3:58] ........
.60 [etd 3:58] .........70 [etd 3:57] .........80 [etd 3:52] .......
..90 [etd 3:49] .........100 [etd 3:44] .........110 [etd 3:42] ......
...120 [etd 3:37] .........130 [etd 3:38] .........140 [etd 3:36] .....
....150 [etd 3:34] .........160 [etd 3:31] .........170 [etd 3:28] ....
.....180 [etd 3:26] .........190 [etd 3:22] .........200 [etd 3:20] ...
......210 [etd 3:16] .........220 [etd 3:14] .........230 [etd 3:10] ..
.......240 [etd 3:08] .........250 [etd 3:06] .........260 [etd 3:04] .
........270 [etd 3:03] .........280 [etd 2:59] .........290
[etd 2:56] .........300 [etd 2:53] .........310 [etd 2:50] .........
320 [etd 2:47] .........330 [etd 2:45] .........340 [etd 2:42] ........
.350 [etd 2:40] .........360 [etd 2:37] .........370 [etd 2:34] .......
..380 [etd 2:32] .........390 [etd 2:29] .........400 [etd 2:27] ......
...410 [etd 2:25] .........420 [etd 2:22] .........430 [etd 2:20] .....
....440 [etd 2:17] .........450 [etd 2:15] .........460 [etd 2:12] ....
.....470 [etd 2:10] .........480 [etd 2:07] .........490 [etd 2:05] ...
......500 [etd 2:02] .........510 [etd 2:00] .........520 [etd 1:58] ..
.......530 [etd 1:56] .........540 [etd 1:53] .........550 [etd 1:51] .
........560 [etd 1:48] .........570 [etd 1:46] .........580
[etd 1:43] .........590 [etd 1:40] .........600 [etd 1:38] .........
610 [etd 1:36] .........620 [etd 1:33] .........630 [etd 1:31] ........
.640 [etd 1:28] .........650 [etd 1:26] .........660 [etd 1:23] .......
..670 [etd 1:21] .........680 [etd 1:18] .........690 [etd 1:16] ......
...700 [etd 1:13] .........710 [etd 1:11] .........720 [etd 1:08] .....
....730 [etd 1:06] .........740 [etd 1:03] .........750 [etd 1:01] ....
.....760 [etd 58 sec] .........770 [etd 56 sec] .........780 [etd 54 sec] ...
......790 [etd 51 sec] .........800 [etd 49 sec] .........810 [etd 46 sec] ..
.......820 [etd 44 sec] .........830 [etd 41 sec] .........840 [etd 39 sec] .
........850 [etd 37 sec] .........860 [etd 34 sec] .........870
[etd 32 sec] .........880 [etd 29 sec] .........890 [etd 27 sec] .........
900 [etd 24 sec] .........910 [etd 22 sec] .........920 [etd 19 sec] ........
.930 [etd 17 sec] .........940 [etd 14 sec] .........950 [etd 12 sec] .......
..960 [etd 10 sec] .........970 [etd 7 sec] .........980 [etd 5 sec] ......
...990 [etd 2 sec] ........ 999.
Done.
plot(wp_spatial_marked_ppp_Lcross.csr, xlab="distance(m)", xlim=c(0,500))
Conclusion : The Functional and Non-Functional water point are not statistically independent as the empirical k-cross line is outside of the envelope of 99% confidence level, and for that we reject the null hypothesis.
9.0 References
Professor Kam Tin Seong’s Hands-on Excercises
Selection of bandwidth type and adjustment side in kernel density estimation over inhomogeneous backgrounds (Paper)
How IDW works (Reference)